Skip to content

fix: improve browser tool instructions and element resolution#351

Closed
jamiepine wants to merge 2 commits intomainfrom
fix/browser-worker-instructions
Closed

fix: improve browser tool instructions and element resolution#351
jamiepine wants to merge 2 commits intomainfrom
fix/browser-worker-instructions

Conversation

@jamiepine
Copy link
Member

@jamiepine jamiepine commented Mar 7, 2026

Summary

Workers keep making the same browser tool mistakes in a loop — forgetting act_kind, using open instead of navigate, failing to find elements. Three fixes:

  • Rewrote browser section in worker prompt with a parameter table, concrete JSON examples, explicit act_kind emphasis, and a "common mistakes" section
  • Multi-strategy element selector — native HTML elements (<button>, <a>, <input>) have implicit ARIA roles without [role] attributes in the DOM. Now tries: [role][aria-label]tag[aria-label]tag[title]tag[role] fallback
  • Improved all error messages to include valid parameter values and JSON examples so the LLM can self-correct instead of retrying the same broken call

Note

This PR improves browser tool usability by providing clearer documentation, better error guidance, and more robust element selection. Changes span prompt files (worker.md.j2, browser_description.md.j2) and implementation (browser.rs), with updated error messages that include JSON examples and parameter validation guidance for faster LLM self-correction.

Written by Tembo for commit 5aab401. This will update automatically on new commits.

- Rewrite browser section in worker prompt with parameter table,
  concrete JSON examples, and explicit act_kind emphasis
- Add multi-strategy element selector (native tags, aria-label,
  title attr) to handle implicit ARIA roles on native HTML elements
- Improve all error messages to include valid values and examples
- Update tool description to front-load act_kind requirement
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 7, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 3334afe9-b644-4461-8de8-d400c9b86898

📥 Commits

Reviewing files that changed from the base of the PR and between 5aab401 and 4e5d924.

📒 Files selected for processing (1)
  • src/tools/browser.rs

Walkthrough

This PR clarifies the browser tool workflow (launch → navigate → snapshot → act → close), expands action definitions and act requirements, and refactors element resolution in Rust to a JS-driven, multi-pattern selector strategy with new helper functions and improved error handling.

Changes

Cohort / File(s) Summary
Browser workflow docs
prompts/en/tools/browser_description.md.j2, prompts/en/worker.md.j2
Rewrote workflow to explicit sequence (launch → navigate → snapshot → act → close); added detailed Actions table, mandatory act_kind values and parameters, element ref semantics (refs reset on navigation), examples, and common-mistakes guidance.
Element resolution & actions (Rust)
src/tools/browser.rs
Replaced direct backend element resolution with JS-driven selector strategy: added build_js_selector, run_js_action, build_selectors_for_ref, role_to_native_tag; updated Act handlers (click, type, hover, scroll_into_view, focus, press_key) to execute JS selectors/actions, improved error messages, and changed element-scoped screenshot handling to use JS bounding-box computation. Removed legacy resolve_element_ref.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: improve browser tool instructions and element resolution' directly summarizes the main changes: documentation improvements and element resolution enhancements for the browser tool.
Description check ✅ Passed The description clearly explains the pull request objectives: reducing repeated worker mistakes through better documentation, improved element selection strategy, and enhanced error messages.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/browser-worker-instructions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/tools/browser.rs (2)

1419-1435: Consider extending textbox mapping to also try textarea.

The current mapping "textbox" => Some("input") misses <textarea> elements, which also have the implicit textbox role. This could reduce match rates for multi-line text inputs. Since build_selectors_for_ref iterates through selectors, the existing fallbacks should still work, but you could improve coverage by handling both.

💡 Potential enhancement (optional)

You could return a comma-separated selector or handle this in build_selectors_for_ref by pushing both input and textarea variants when role is textbox:

-        "textbox" => Some("input"),
+        "textbox" => Some("input, textarea"),

Or handle it at the build_selectors_for_ref level for finer control. This is a nice-to-have improvement that can be deferred.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/browser.rs` around lines 1419 - 1435, The role_to_native_tag
function currently maps "textbox" only to "input", missing textarea elements;
update role_to_native_tag to include textarea as well (e.g., return a selector
that matches both input and textarea for the "textbox" case) or alternatively
modify build_selectors_for_ref to push both the input and textarea selectors
when it sees role "textbox" so multi-line textareas are covered by the selector
generation; locate function role_to_native_tag and adjust the "textbox" branch
(or update build_selectors_for_ref's handling of textbox) accordingly.

1442-1448: Consider more robust CSS escaping for element names.

The current escaping at line 1444 only handles single quotes. Element names containing backslashes, newlines, or null characters could break CSS selector parsing or cause unexpected matches. While accessible names from the AX tree are typically sanitized, a more defensive approach would be safer.

💡 Optional: More robust escaping
-    let escaped_name = elem_ref.name.as_ref().map(|n| n.replace('\'', "\\'"));
+    let escaped_name = elem_ref.name.as_ref().map(|n| {
+        n.replace('\\', "\\\\")
+            .replace('\'', "\\'")
+            .replace('\n', "\\n")
+    });

Alternatively, consider using CSS.escape() via the cssparser crate if names become more complex. This is low priority since AX tree names are typically well-formed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/browser.rs` around lines 1442 - 1448, The code in
build_selectors_for_ref only replaces single quotes when building escaped_name;
update it to perform robust CSS attribute-value escaping for elem_ref.name (or
call a library escape function) so backslashes, control characters (newlines,
nulls) and quotes are properly escaped before formatting selectors; implement a
small helper (e.g., escape_css_attr) or use a crate-provided escape (cssparser
or similar) and replace the current escaped_name computation and subsequent uses
in build_selectors_for_ref to use the new safe-escaped value.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/tools/browser.rs`:
- Around line 1419-1435: The role_to_native_tag function currently maps
"textbox" only to "input", missing textarea elements; update role_to_native_tag
to include textarea as well (e.g., return a selector that matches both input and
textarea for the "textbox" case) or alternatively modify build_selectors_for_ref
to push both the input and textarea selectors when it sees role "textbox" so
multi-line textareas are covered by the selector generation; locate function
role_to_native_tag and adjust the "textbox" branch (or update
build_selectors_for_ref's handling of textbox) accordingly.
- Around line 1442-1448: The code in build_selectors_for_ref only replaces
single quotes when building escaped_name; update it to perform robust CSS
attribute-value escaping for elem_ref.name (or call a library escape function)
so backslashes, control characters (newlines, nulls) and quotes are properly
escaped before formatting selectors; implement a small helper (e.g.,
escape_css_attr) or use a crate-provided escape (cssparser or similar) and
replace the current escaped_name computation and subsequent uses in
build_selectors_for_ref to use the new safe-escaped value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0d7828a5-4ee2-4f5b-b9f6-6395fbf9ea99

📥 Commits

Reviewing files that changed from the base of the PR and between 7e78eee and 5aab401.

📒 Files selected for processing (3)
  • prompts/en/tools/browser_description.md.j2
  • prompts/en/worker.md.j2
  • src/tools/browser.rs

Comment on lines +109 to +110
| `act` | `act_kind`, `element_ref` | Interact with an element. **`act_kind` is mandatory.** |
| `screenshot` | `full_page` (optional) | Capture the viewport (or full page). |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit: the quick-reference table makes element_ref look required for all act kinds, but press_key can omit it. Also screenshot supports element_ref (element screenshot) per the tool args, but the table doesn’t mention it.

Suggested change
| `act` | `act_kind`, `element_ref` | Interact with an element. **`act_kind` is mandatory.** |
| `screenshot` | `full_page` (optional) | Capture the viewport (or full page). |
| `act` | `act_kind`, `element_ref` (except `press_key`) | Interact with an element. **`act_kind` is mandatory.** |
| `screenshot` | `element_ref` (optional), `full_page` (optional) | Capture the viewport, full page, or a specific element. |

@@ -1 +1 @@
Browser automation tool. Launch a headless Chrome browser, navigate pages, interact with elements, take screenshots, and extract page content. Workflow: launch → navigate → snapshot (get element refs) → act (click/type by ref) → screenshot. Element refs like "e1", "e2" are assigned during snapshot and used in act calls. No newline at end of file
Browser automation tool. Workflow: launch → navigate → snapshot → act → close. The `act` action REQUIRES `act_kind` (click, type, press_key, hover, scroll_into_view, focus) and `element_ref` from the last snapshot. Example: {"action": "act", "act_kind": "click", "element_ref": "e3"}. Always snapshot before acting — refs reset on navigation. Use `navigate` to go to URLs (not `open`, which creates new tabs). No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small consistency thing with the worker prompt: press_key doesn’t require an element_ref, so saying act requires element_ref can push workers into always inventing one.

Suggested change
Browser automation tool. Workflow: launch → navigate → snapshot → act → close. The `act` action REQUIRES `act_kind` (click, type, press_key, hover, scroll_into_view, focus) and `element_ref` from the last snapshot. Example: {"action": "act", "act_kind": "click", "element_ref": "e3"}. Always snapshot before acting — refs reset on navigation. Use `navigate` to go to URLs (not `open`, which creates new tabs).
Browser automation tool. Workflow: launch → navigate → snapshot → act → close. The `act` action REQUIRES `act_kind` (click, type, press_key, hover, scroll_into_view, focus). For most act kinds, pass `element_ref` from the last snapshot; for `press_key`, pass `key` and optionally `element_ref`. Example: {"action": "act", "act_kind": "click", "element_ref": "e3"}. Always snapshot before acting — refs reset on navigation. Use `navigate` to go to URLs (not `open`, which creates new tabs).

Comment on lines +1362 to +1377
for selector in &selectors {
if let Ok(element) = page.find_element(selector).await {
return Ok(element);
}
}

Err(BrowserError::new(format!(
"failed to find element for ref '{ref_id}' \
(role: {}, name: {:?}) — the page may have changed since \
the last snapshot. Run snapshot again to get fresh refs, \
then retry with the updated ref. \
Selectors tried: {}",
elem_ref.role,
elem_ref.name,
selectors.join(", "),
)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we swallow all find_element errors and always return the same “page may have changed” message. If the selector is invalid (or CDP errors), snapshot+retry won’t help and we’ll just loop.

Could be worth capturing the last error and including it in the final message so the worker can course-correct faster.

Suggested change
for selector in &selectors {
if let Ok(element) = page.find_element(selector).await {
return Ok(element);
}
}
Err(BrowserError::new(format!(
"failed to find element for ref '{ref_id}' \
(role: {}, name: {:?}) — the page may have changed since \
the last snapshot. Run snapshot again to get fresh refs, \
then retry with the updated ref. \
Selectors tried: {}",
elem_ref.role,
elem_ref.name,
selectors.join(", "),
)))
let mut last_error: Option<String> = None;
for selector in &selectors {
match page.find_element(selector).await {
Ok(element) => return Ok(element),
Err(error) => last_error = Some(error.to_string()),
}
}
let last_error = last_error.unwrap_or_else(|| "unknown error".to_string());
Err(BrowserError::new(format!(
"failed to find element for ref '{ref_id}' \
(role: {}, name: {:?}) — the page may have changed since \
the last snapshot. Run snapshot again to get fresh refs, \
then retry with the updated ref. \
Selectors tried: {}. Last error: {last_error}",
elem_ref.role,
elem_ref.name,
selectors.join(", "),
)))

Comment on lines +1442 to +1460
fn build_selectors_for_ref(elem_ref: &ElementRef) -> Vec<String> {
let mut selectors = Vec::with_capacity(4);
let escaped_name = elem_ref.name.as_ref().map(|n| n.replace('\'', "\\'"));

// Strategy 1: [role='X'][aria-label='Y'] — explicit ARIA attributes
if let Some(name) = &escaped_name {
selectors.push(format!("[role='{}'][aria-label='{name}']", elem_ref.role));
}

if let Some(name) = &elem_ref.name {
// Escape single quotes in the name for CSS selector safety
let escaped = name.replace('\'', "\\'");
format!("{role_selector}[aria-label='{escaped}']")
} else {
role_selector
// Strategy 2: native tag + aria-label (e.g., button[aria-label='Submit'])
if let Some(tag) = role_to_native_tag(&elem_ref.role) {
if let Some(name) = &escaped_name {
selectors.push(format!("{tag}[aria-label='{name}']"));
}
}

// Strategy 3: native tag with text content match via XPath-style selector
// using the name as button/link text
if let Some(tag) = role_to_native_tag(&elem_ref.role) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the “XPath-style selector / text content match” comment doesn’t match what the code actually does (title + tag-only fallbacks). Also with_capacity(4) looks a bit low given the max selectors here is 5.

Suggested change
fn build_selectors_for_ref(elem_ref: &ElementRef) -> Vec<String> {
let mut selectors = Vec::with_capacity(4);
let escaped_name = elem_ref.name.as_ref().map(|n| n.replace('\'', "\\'"));
// Strategy 1: [role='X'][aria-label='Y'] — explicit ARIA attributes
if let Some(name) = &escaped_name {
selectors.push(format!("[role='{}'][aria-label='{name}']", elem_ref.role));
}
if let Some(name) = &elem_ref.name {
// Escape single quotes in the name for CSS selector safety
let escaped = name.replace('\'', "\\'");
format!("{role_selector}[aria-label='{escaped}']")
} else {
role_selector
// Strategy 2: native tag + aria-label (e.g., button[aria-label='Submit'])
if let Some(tag) = role_to_native_tag(&elem_ref.role) {
if let Some(name) = &escaped_name {
selectors.push(format!("{tag}[aria-label='{name}']"));
}
}
// Strategy 3: native tag with text content match via XPath-style selector
// using the name as button/link text
if let Some(tag) = role_to_native_tag(&elem_ref.role) {
fn build_selectors_for_ref(elem_ref: &ElementRef) -> Vec<String> {
let mut selectors = Vec::with_capacity(5);
let escaped_name = elem_ref.name.as_ref().map(|n| n.replace('\'', "\\'"));
// Strategy 1: [role='X'][aria-label='Y'] — explicit ARIA attributes
if let Some(name) = &escaped_name {
selectors.push(format!("[role='{}'][aria-label='{name}']", elem_ref.role));
}
// Strategy 2: native tag + aria-label (e.g., button[aria-label='Submit'])
if let Some(tag) = role_to_native_tag(&elem_ref.role) {
if let Some(name) = &escaped_name {
selectors.push(format!("{tag}[aria-label='{name}']"));
}
}
// Strategy 3: native tag + title fallback, then tag-only fallback
if let Some(tag) = role_to_native_tag(&elem_ref.role) {

Element clicks/type/hover/focus now use Runtime.evaluate with DOM methods
instead of CDP node IDs that go stale between snapshot and interaction.

- Rewrite handle_act to use JS injection for all ActKind variants
- Add build_js_selector: tries CSS selectors then text-content fallback
- Add run_js_action: executes JS via page.evaluate, parses JSON result
- Update handle_screenshot to use JS + clip viewport instead of Element
- Remove old resolve_element_ref (replaced by JS injection)
- Merge role_to_native_tag entries from both branches
@jamiepine jamiepine closed this Mar 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant